60 research outputs found

    A sensor-less NBTI mitigation methodology for NoC architectures

    Get PDF
    CMOS technology improvement allows to increase the number of cores integrated on a single chip and makes Network-on-Chips (NoCs) a key component from the performance and reliability standpoints. Unfortunately, continuous scaling of CMOS technology poses severe concerns regarding failure mechanisms such as NBTI and stressmigration, that are crucial in achieving acceptable component lifetime. Process variation complicates the scenario, decreasing device lifetime and performance predictability during chip fabrication. This paper presents a novel sensor-less methodology to reduce the NBTI degradation in the on-chip network virtual channel buffers, considering process variation effects as well. Experimental validation is obtained using a cycle accurate simulator considering both real and synthetic traffic patterns. We compare our methodology to the best sensor-wise approach used as reference golden model. The proposed sensor-less strategy achieves results within 25% to the optimal sensor-wise methodology while this gap is reduced around 10% decreasing the number of virtual channels per input port. Moreover, our proposal can mitigate NBTI impact both in short and long run, since we recover both the most degraded VC (short run) as well as all the other VCs (long term)

    Modeling DVFS and Power-Gating Actuators for Cycle-Accurate NoC-Based Simulators

    Get PDF
    Networks-on-chip (NoCs) are a widely recognized viable interconnection paradigm to support the multi-core revolution. One of the major design issues of multicore architectures is still the power, which can no longer be considered mainly due to the cores, since the NoC contribution to the overall energy budget is relevant. To face both static and dynamic power while balancing NoC performance, different actuators have been exploited in literature, mainly dynamic voltage frequency scaling (DVFS) and power gating. Typically, simulation-based tools are employed to explore the huge design space by adopting simplified models of the components. As a consequence, the majority of state-of-the-art on NoC power-performance optimization do not accurately consider timing and power overheads of actuators, or (even worse) do not consider them at all, with the risk of overestimating the benefits of the proposed methodologies. This article presents a simulation framework for power-performance analysis of multicore architectures with specific focus on the NoC. It integrates accurate power gating and DVFS models encompassing also their timing and power overheads. The value added of our proposal is manyfold: (i) DVFS and power gating actuators are modeled starting from SPICE-level simulations; (ii) such models have been integrated in the simulation environment; (iii) policy analysis support is plugged into the framework to enable assessment of different policies; (iv) a flexible GALS (globally asynchronous locally synchronous) support is provided, covering both handshake and FIFO re-synchronization schemas. To demonstrate both the flexibility and extensibility of our proposal, two simple policies exploiting the modeled actuators are discussed in the article

    A Temperature and Reliability Oriented Simulation Framework for Multi-core Architectures

    Get PDF
    The increasing complexity of multi-core architectures demands for a comprehensive evaluation of different solutions and alternatives at every stage of the design process, considering different aspects at the same time. Simulation frameworks are attractive tools to fulfil this requirement, due to their flexibility. Nevertheless, state-of-the-art simulation frameworks lack a joint analysis of power, performance, temperature profile and reliability projection at system-level, focusing only on a specific aspect. This paper presents a comprehensive estimation framework that jointly exploits these design metrics at system-level, considering processing cores, interconnect design and storage elements. We describe the framework in details, and provide a set of experiments that highlight its capability and flexibility, focusing on temperature and reliability analysis of multi-core architectures supported by Network-on-Chip interconnect

    A DVFS Cycle Accurate Simulation Framework with Asynchronous NoC Design for Power-Performance Optimizations

    Get PDF
    Network-on-Chip (NoC) is a flexible and scalable solution to interconnect multi-cores, with a strong influence on the performance of the whole chip. On-chip network affects also the overall power consumption, thus requiring accurate early-stage estimation and optimization methodologies. In this scenario, the Dynamic Voltage Frequency Scaling (DVFS) technique have been proposed both for CPUs and NoCs. The promise is to be a flexible and scalable way to jointly optimize power-performance, addressing both static and dynamic power sources. Being simulation a de-facto prime solution to explore novel multi-core architectures, a reliable full system analysis requires to integrate in the toolchain accurate timing and power models for the DVFS block and for the resynchronization logic between different Voltage and Frequency Islands (VFIs). In such a way, a more accurate validation of novel optimization methodologies which exploit such actuator is possible, since both architectural and actuator overheads are considered at the same time. This work proposes a complete cycle accurate framework for multi-core design supporting Global Asynchronous Local Synchronous (GALS) NoC design and DVFS actuators for the NoC. Furthermore, static and dynamic frequency assignment is possible with or without the use of the voltage regulator. The proposed framework sits on accurate analytical timing model and SPICE-based power measures, providing accurate estimates of both timing and power overheads of the power control mechanisms

    Analysis and countermeasures to side-channel attacks: a hardware design perspective

    Get PDF
    With the Internet-of-Things revolution, the security assessment against implementation attacks has become a critical concern not only for hardware accelerators but also for embedded CPUs executing cryptographic primitives. Starting from an FPGA implementation of the open-hardware ORPSoC system-on-chip running a software version of the AES, this paper explores two implications of the side-channel information leakage that impact the hardware design of general purpose computing platforms. First, we explore the mapping between the side-channel information leakage and the microarchitectural components that are contributing to it. Second, we discuss the variations in the observability of the side-channel information leakage at post-synthesis an

    HLS-based acceleration of the BIKE post-quantum KEM on embedded-class heterogeneous SoCs

    Get PDF
    An effective transition to post-quantum cryptography mandates its deployment on embedded-class devices, guaranteeing adequate performance while satisfying their strict area constraints. This work accelerates BIKE, a QC-MDPC code-based post-quantum KEM, through HLS on embedded-class heterogeneous SoCs that couple a CPU with FPGA programmable logic. The proposed methodology implements HLS-generated accelerators to compute the most time-consuming operations of BIKE, identified by analyzing the software-only execution. The mix of accelerators instantiated in hardware and operations executed in software, as well as the configurable architectural parameters of the former, are then determined, depending on the resources available on the target SoC, to minimize BIKE’s execution time. Experiments on AMD Zynq-7000 SoCs highlight a speedup of up to 3.34 times compared to the reference software execution and up to 1.98 times over state-of-the-art HW/SW implementations targeting the same chips

    A survey on run-time power monitors at the edge

    Get PDF
    Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader

    Thermal/performance trade-off in network-on-chip architectures

    Get PDF
    Multi-core architectures are a promising paradigm to exploit the huge integration density reached by high-performance systems. Indeed, integration density and technology scaling are causing undesirable operating temperatures, having net impact on reduced reliability and increased cooling costs. Dynamic Thermal Management (DTM) approaches have been proposed in literature to control temperature profile at run-time, while design-time approaches generally provide floorplan-driven solutions to cope with temperature constraints. Nevertheless, a suitable approach to collect performance, thermal and reliability metrics has not been proposed, yet. This work presents a novel methodology to jointly optimize temperature/performance trade-off in reliable high-performance parallel architectures with security constraints achieved by workload physical isolation on each core. The proposed methodology is based on a linear formal model relating temperature and duty-cycle on one side, and performance and duty-cycle on the other side. Extensive experimental results on real-world use-case scenarios show the goodness of the proposed model, suitable for design-time system-wide optimization to be used in conjunction with DTM technique

    A Control-based Methodology for Power-performance Optimization in NoCs Exploiting DVFS

    Get PDF
    Networks-on-Chip (NoCs) are considered a viable solution to fully exploit the computational power of multi- and many-cores, but their non negligible power consumption requires ad hoc power-performance design methodologies. In this perspective, several proposals exploited the possibility to dynamically tune voltage and frequency for the interconnect, taking steps from traditional CPU-based power management solutions. However, the impact of the actuators, i.e. the limited range of frequencies for a PLL (Phase Locked Loop) or the time to increase voltage and frequency for a Dynamic Voltage and Frequency Scaling (DVFS) modules, are often not carefully accounted for, thus overestimating the benefits. This paper presents a control-based methodology for the NoC power-performance optimization exploiting the Dynamic Frequency Scaling (DFS). Both timing and power overheads of the actuators are considered, thanks to an ad hoc simulation framework. Moreover the proposed methodology eventually allows for user and/or OS interactions to change between different high level power-performance modes, i.e. to trigger performance oriented or power saving system behaviors. Experimental validation considered a 16-core architecture comparing our proposal with different settings of threshold-based policies. We achieved a speedup up to 3 for the timing and a reduction up to 33.17% of the power ∗ time product against the best threshold-based policy. Moreover, our best control-based scheme provides an averaged power-performance product improvement of 16.50% and 34.79% against the best and the second considered threshold-based policy setting

    CUTBUF: Buffer Management and Router Design for Traffic Mixing in VNET-Based NoCs

    Get PDF
    "© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works[EN] Router's buffer design and management strongly influence energy, area and performance of on-chip networks, hence it is crucial to encompass all of these aspects in the design process. At the same time, the NoC design cannot disregard preventing network-level and protocol-level deadlocks by devoting ad-hoc buffer resources to that purpose. In chip multiprocessor systems the coherence protocol usually requires different virtual networks (VNETs) to avoid deadlocks. Moreover, VNETutilization is highly unbalanced and there is no way to share buffers between them due to the need to isolate different traffic types. This paper proposes CUTBUF, a novel NoC router architecture to dynamically assign virtual channels (VCs) to VNETs depending on the actual VNETs load to significantly reduce the number of physical buffers in routers, thus saving area and power without decreasing NoC performance. Moreover, CUTBUF allows to reuse the same buffer for different traffic types while ensuring that the optimized NoC is deadlock-free both at network and protocol level. In this perspective, all the VCs are considered spare queues not statically assigned to a specific VNETand the coherence protocol only imposes a minimum number of queues to be implemented. Synthetic applications as well as real benchmarks have been used to validate CUTBUF, considering architectures ranging from 16 up to 48 cores. Moreover, a complete RTL router has been designed to explore area and power overheads. Results highlight how CUTBUF can reduce router buffers up to 33 percent with 2 percent of performance degradation, a 5 percent of operating frequency decrease and area and power saving up to 30.6 and 30.7 percent, respectively. Conversely, the flexibility of the proposed architecture improves by 23.8 percent the performance of the baseline NoC router when the same number of buffers is used.Zoni, D.; Flich Cardo, J.; Fornaciari, W. (2016). CUTBUF: Buffer Management and Router Design for Traffic Mixing in VNET-Based NoCs. IEEE Transactions on Parallel and Distributed Systems. 27(6):1603-1616. doi:10.1109/TPDS.2015.2468716S1603161627
    • …
    corecore